A Heuristic Approach for Recognizing a Document's Language Used for the Internet Search Engine GETESS

نویسندگان

  • Antje Raab-Düsterhöft
  • Sherry Gröticke
چکیده

In this paper; we illustrate how Internet documents can be automatically analyzed in order to identih the document’s language. This language knowledge is then used for the Internet search engine, GETESS. The aim of the language-classijication heuristics is to ensure that documents with the same content, but different languages (e.g. , in German and English), will not simultaneously presented to the user as search results. The GETESS search engine only provides the results in the language relevant to the user: Consequently, the search-result set is narrower and more appropriatelyJits the needs of the user:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GETESS: Constructing a Linguistic Search Index for an Internet Search Engine

In this paper we illustrate how Internet documents can be automatically analyzed in order to capture the content of a document in a more detailed way than usually The result of the document analysis is called abstract and will be used as a linguistic search index for the Internet search engine GETESS We show how the linguistic analysis system SMES can be used for a Harvest based search engine f...

متن کامل

Learning by Searching: A Learning Environment that Provides Searching and Analysis Facilities for Supporting Trend Analysis Activities

With the popularity of the Internet, online searching is becoming an important part of learning. In this paper, based on the “Learning by Searching” theory, a learning environment is developed, which includes a search engine to assist students in recognizing the progression of trends and keyword transitions for specific domains. To efficiently support research trend surveys, an automatic data a...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

External Plagiarism Detection based on Human Behaviors in Producing Paraphrases of Sentences in English and Persian Languages

With the advent of the internet and easy access to digital libraries, plagiarism has become a major issue. Applying search engines is one of the plagiarism detection techniques that converts plagiarism patterns to search queries. Generating suitable queries is the heart of this technique and existing methods suffer from lack of producing accurate queries, Precision and Speed of retrieved result...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000